-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6549: [C++] Switch to jemalloc 5.2.x #5365
Conversation
@ursabot benchmark |
Here is a macro-benchmark of reading a 186 MB CSV file using Python (
So it seems that jemalloc 5.2 may bring significant multi-thread allocation improvement. @xhochy @wesm |
Results for
(on a Ryzen 7 1700, Ubuntu 18.04, gcc 7.4.0) |
Results for
|
AMD64 Ubuntu 18.04 C++ Benchmark (#61467) builder has been succeeded. Revision: e845e32 ====================================================== ================ ================ ============
benchmark baseline contender change
====================================================== ================ ================ ============
VisitBits/8192 1.07781e+08 1.07379e+08 -0.00372867
BitmapWriter/8192 7.15675e+07 7.14366e+07 -0.00182912
FirstTimeBitmapWriter/8192 1.03104e+08 1.0292e+08 -0.00178664
CopyBitmapWithOffset/8192 5.98019e+08 5.96996e+08 -0.00171053
BitmapReader/8192 1.10054e+08 1.09855e+08 -0.00181152
GenerateBits/8192 9.29735e+07 9.26924e+07 -0.00302338
GenerateBitsUnrolled/8192 1.39046e+08 1.414e+08 0.0169262
CopyBitmapWithoutOffset/8192 6.6857e+10 6.67039e+10 -0.00229046
VisitBitsUnrolled/8192 2.99562e+08 2.99039e+08 -0.00174737
TypeEqualsWithMetadata 4.21107e+07 4.25519e+07 0.0104772
SchemaEqualsWithMetadata 3.43963e+07 3.43837e+07 -0.000367477
SchemaEquals 3.88801e+07 3.88826e+07 6.51288e-05
TypeEqualsComplex 4.8829e+07 4.87996e+07 -0.000601411
TypeEqualsSimple 7.30582e+07 7.29571e+07 -0.00138386
ParallelMemoryCopy/threads:32/real_time 2.37693e+10 2.3736e+10 -0.0014002
ParallelMemoryCopy/threads:1/real_time 7.54748e+09 7.52592e+09 -0.0028566
ParallelMemoryCopy/threads:40/real_time 2.33125e+10 2.3135e+10 -0.00761385
BufferOutputStreamSmallWrites/real_time 1.17461e+10 1.33316e+10 0.13498
ParallelMemoryCopy/threads:8/real_time 2.50697e+10 2.50092e+10 -0.00241529
BufferOutputStreamTinyWrites/real_time 4.22158e+08 4.21612e+08 -0.00129325
- ParallelMemoryCopy/threads:4/real_time 2.4153e+10 2.20447e+10 -0.0872892
BufferOutputStreamLargeWrites/real_time 1.32615e+10 1.31972e+10 -0.00484125
ParallelMemoryCopy/threads:2/real_time 1.04534e+10 1.08204e+10 0.0351113
ParallelMemoryCopy/threads:16/real_time 2.42289e+10 2.43053e+10 0.00315518
UniqueInt64WithNulls/4194304/10240 1.30154e+09 1.3505e+09 0.0376156
BuildStringDictionary 5.22615e+07 5.35353e+07 0.0243742
UniqueInt64WithNulls/4194304/1024 2.02566e+09 2.06992e+09 0.0218493
UniqueString100bytes/4194304/1024 2.10611e+09 2.11604e+09 0.0047149
UniqueString10bytes/4194304/1024 5.30424e+08 5.31164e+08 0.0013947
UniqueUInt8NoNulls/4194304/200 1.23275e+09 1.23554e+09 0.00226235
UniqueInt64NoNulls/4194304/10240 1.68838e+09 1.75714e+09 0.0407267
UniqueUInt8WithNulls/4194304/200 4.73741e+08 4.74357e+08 0.00130097
BuildDictionary 8.67283e+08 8.68186e+08 0.00104068
UniqueInt64NoNulls/4194304/1024 3.04604e+09 2.93618e+09 -0.0360649
UniqueString100bytes/4194304/10240 1.19991e+09 1.24525e+09 0.0377842
UniqueString10bytes/4194304/10240 2.99307e+08 3.07488e+08 0.0273317
BufferedOutputStreamLargeWritesToPipe/real_time 2.37666e+09 2.38009e+09 0.00144467
BufferedOutputStreamSmallWritesToNull/real_time 1.13139e+09 1.13283e+09 0.00127495
FileOutputStreamSmallWritesToNull/real_time 6.33922e+07 6.32965e+07 -0.00151039
BufferedOutputStreamSmallWritesToPipe/real_time 7.52551e+08 7.42311e+08 -0.013607
FileOutputStreamSmallWritesToPipe/real_time 3.72528e+07 3.73065e+07 0.00144254
FileOutputStreamLargeWritesToPipe/real_time 2.39946e+09 2.34043e+09 -0.0245978
TakeInt64/1048576/1/min_time:1.000 4.74425e+08 4.90142e+08 0.0331295
TakeInt64/32768/1/min_time:1.000 6.04803e+08 6.06433e+08 0.00269395
TakeInt64VsFilter/32768/1/min_time:1.000 2.2238e+09 2.22454e+09 0.000332469
TakeString/32768/0/min_time:1.000 1.88563e+09 1.92809e+09 0.022514
TakeInt64VsFilter/1048576/1/min_time:1.000 2.36772e+09 2.37462e+09 0.0029175
TakeString/8388608/1/min_time:1.000 1.14393e+09 1.17467e+09 0.0268774
TakeFixedSizeList1Int64/32768/0/min_time:1.000 1.72796e+08 1.73326e+08 0.00306883
TakeFixedSizeList1Int64/32768/1/min_time:1.000 1.65906e+08 1.6658e+08 0.00406374
TakeString/32768/10/min_time:1.000 1.72581e+09 1.75257e+09 0.0155097
TakeInt64/32768/50/min_time:1.000 3.76121e+08 3.6981e+08 -0.0167782
TakeInt64VsFilter/32768/10/min_time:1.000 1.4778e+09 1.48427e+09 0.00437996
TakeInt64VsFilter/8388608/1/min_time:1.000 2.36672e+09 2.37386e+09 0.00301372
TakeString/1048576/1/min_time:1.000 1.34072e+09 1.39017e+09 0.0368845
TakeFixedSizeList1Int64/32768/10/min_time:1.000 1.63546e+08 1.64102e+08 0.00340082
TakeInt64/8388608/1/min_time:1.000 4.18828e+08 4.32568e+08 0.0328068
TakeInt64VsFilter/32768/0/min_time:1.000 2.43805e+09 2.44576e+09 0.00315919
TakeFixedSizeList1Int64/1048576/1/min_time:1.000 1.26298e+08 1.28979e+08 0.0212284
TakeInt64/32768/10/min_time:1.000 5.19794e+08 5.20704e+08 0.00175072
TakeString/32768/50/min_time:1.000 1.12374e+09 1.14455e+09 0.018524
TakeString/32768/1/min_time:1.000 1.62253e+09 1.63965e+09 0.0105544
TakeFixedSizeList1Int64/32768/50/min_time:1.000 1.48295e+08 1.48739e+08 0.00299176
TakeInt64VsFilter/32768/50/min_time:1.000 8.16256e+08 8.1189e+08 -0.00534934
TakeFixedSizeList1Int64/8388608/1/min_time:1.000 1.17624e+08 1.20053e+08 0.0206492
TakeInt64/32768/0/min_time:1.000 6.29186e+08 6.30629e+08 0.0022947
TimestampParsing<TimeUnit::SECOND> 5.57023e+07 5.57807e+07 0.00140837
FloatParsing<FloatType> 9.0668e+06 9.07858e+06 0.00129912
IntegerParsing<UInt16Type> 3.04247e+08 3.05397e+08 0.00378227
TimestampParsing<TimeUnit::NANO> 5.37585e+07 5.38465e+07 0.00163671
IntegerParsing<Int16Type> 2.36003e+08 2.36221e+08 0.000924136
IntegerParsing<UInt32Type> 3.19756e+08 3.2081e+08 0.00329493
IntegerParsing<Int8Type> 2.46409e+08 2.46764e+08 0.00144044
IntegerParsing<Int32Type> 1.83964e+08 1.84267e+08 0.00164543
IntegerParsing<UInt8Type> 4.46707e+08 4.47722e+08 0.00227308
TimestampParsing<TimeUnit::MILLI> 5.42119e+07 5.42678e+07 0.00102983
FloatParsing<DoubleType> 2.00043e+07 2.00435e+07 0.00195585
IntegerParsing<UInt64Type> 2.34148e+08 2.34361e+08 0.000911207
IntegerParsing<Int64Type> 1.4559e+08 1.4606e+08 0.00322494
TimestampParsing<TimeUnit::MICRO> 5.34906e+07 5.35633e+07 0.0013585
TrieLookupFound 1.14618e+08 1.16171e+08 0.0135445
- TrieLookupNotFound 2.83641e+08 2.28253e+08 -0.195275
HashIntegers 8.81392e+09 8.81386e+09 -7.27231e-06
HashLargeStrings 1.20018e+10 1.18807e+10 -0.0100904
HashSmallStrings 2.48089e+09 2.47653e+09 -0.00175449
HashMediumStrings 6.35033e+09 6.34297e+09 -0.00115959
ValidateLargeAscii 1.8748e+10 1.87727e+10 0.00132052
ValidateSmallAlmostAscii 3.15405e+09 3.15462e+09 0.000179343
ValidateLargeNonAscii 1.5895e+09 1.58973e+09 0.000145659
- ValidateSmallNonAscii 1.49769e+09 1.3436e+09 -0.102882
ValidateTinyNonAscii 1.32174e+09 1.32166e+09 -6.04663e-05
ValidateSmallAscii 8.20921e+09 8.20961e+09 4.89301e-05
ValidateLargeAlmostAscii 3.1981e+09 3.19835e+09 8.09469e-05
ValidateTinyAscii 3.85215e+09 3.84377e+09 -0.0021753
CompareArrayArrayKernel/32768/10 2.05349e+10 2.0379e+10 -0.00758768
- CompareArrayScalarKernel/32768/10 1.26337e+10 1.18194e+10 -0.0644504
CompareArrayScalarKernel/32768/50 1.05602e+10 1.1589e+10 0.0974232
CompareArrayArrayKernel/32768/1 2.04325e+10 2.05987e+10 0.00813217
CompareArrayArrayKernel/32768/50 2.05181e+10 2.05285e+10 0.000506319
CompareArrayScalarKernel/32768/0 1.18268e+10 1.28373e+10 0.0854427
CompareArrayScalarKernel/32768/1 1.132e+10 1.21216e+10 0.0708079
CompareArrayArrayKernel/32768/0 2.20585e+10 2.20663e+10 0.000354719
FloatConversion 2.48589e+07 2.48902e+07 0.00125976
StringConversion 6.57084e+07 6.5637e+07 -0.00108536
Decimal128Conversion 1.14456e+07 1.1444e+07 -0.000136588
Int64Conversion 5.71114e+07 5.71772e+07 0.00115285
SumKernel/32768/0 1.84054e+10 1.8265e+10 -0.00762826
SumKernel/32768/10 1.32769e+10 1.32482e+10 -0.00215986
SumKernel/32768/1 1.53767e+10 1.52817e+10 -0.00617827
SumKernel/32768/50 1.15035e+10 1.14894e+10 -0.00122603
BinaryBitOp 1.05461e+08 1.04935e+08 -0.00498816
BinaryMathOpAggregate 1.0577e+07 1.06907e+07 0.0107505
Constants 4.07316e+07 4.05163e+07 -0.00528619
FromString 1.36385e+07 1.36482e+07 0.000713888
BinaryCompareOp 7.18793e+07 7.2261e+07 0.00531018
UnaryOp 8.81945e+07 8.80089e+07 -0.00210446
BinaryMathOp 2.78412e+07 2.77963e+07 -0.00161293
BinaryCompareOpConstant 6.49211e+07 6.21469e+07 -0.0427325
SortToIndicesInt64/1048576/1/min_time:1.000 6.87089e+07 6.85906e+07 -0.00172265
SortToIndicesInt64/32768/50/min_time:1.000 1.61129e+08 1.61192e+08 0.000392657
SortToIndicesInt64/32768/10/min_time:1.000 9.3528e+07 9.34592e+07 -0.00073606
SortToIndicesInt64/8388608/1/min_time:1.000 5.99098e+07 5.98487e+07 -0.00101989
SortToIndicesInt64/32768/1/min_time:1.000 8.66826e+07 8.71045e+07 0.00486761
SortToIndicesInt64/32768/0/min_time:1.000 8.72577e+07 8.72488e+07 -0.000102011
DetectUIntWidthNoNulls 2.35736e+10 2.35112e+10 -0.00264441
DetectIntWidthNoNulls 2.04091e+10 2.03832e+10 -0.00126539
DetectIntWidthNulls 1.07373e+10 1.07388e+10 0.000140204
DetectUIntWidthNulls 1.28387e+10 1.28363e+10 -0.000181795
FilterString/32768/0/min_time:1.000 5.07058e+09 4.91931e+09 -0.0298319
FilterInt64/32768/50/min_time:1.000 4.11019e+08 4.10888e+08 -0.000318984
FilterFixedSizeList1Int64/8388608/1/min_time:1.000 3.80219e+08 3.79409e+08 -0.00213096
FilterFixedSizeList1Int64/32768/10/min_time:1.000 3.37975e+08 3.37712e+08 -0.000777484
FilterFixedSizeList1Int64/32768/50/min_time:1.000 1.85149e+08 1.83802e+08 -0.0072791
FilterString/8388608/1/min_time:1.000 3.90376e+09 3.81026e+09 -0.023952
FilterString/1048576/1/min_time:1.000 3.66454e+09 3.69179e+09 0.00743729
FilterFixedSizeList1Int64/1048576/1/min_time:1.000 3.80925e+08 3.82714e+08 0.00469731
FilterInt64/32768/10/min_time:1.000 7.78435e+08 7.74795e+08 -0.0046764
FilterInt64/32768/1/min_time:1.000 8.47656e+08 8.5532e+08 0.00904099
FilterInt64/8388608/1/min_time:1.000 6.6595e+08 6.67596e+08 0.00247147
FilterInt64/32768/0/min_time:1.000 8.40875e+08 8.36243e+08 -0.00550858
FilterFixedSizeList1Int64/32768/0/min_time:1.000 4.70932e+08 4.72803e+08 0.00397245
FilterFixedSizeList1Int64/32768/1/min_time:1.000 4.05408e+08 4.02649e+08 -0.00680562
FilterString/32768/1/min_time:1.000 4.95204e+09 4.96272e+09 0.00215661
FilterString/32768/10/min_time:1.000 4.52546e+09 4.44782e+09 -0.0171571
FilterString/32768/50/min_time:1.000 2.22963e+09 2.19289e+09 -0.0164786
FilterInt64/1048576/1/min_time:1.000 6.6937e+08 6.69143e+08 -0.000338985
BuildInt64DictionaryArraySequential 3.57793e+08 3.55883e+08 -0.00533902
BuildFixedSizeBinaryArray 3.89211e+08 3.95862e+08 0.0170869
BufferBuilderLargeWrites/real_time 2.40252e+09 2.33018e+09 -0.0301095
BuildBooleanArrayNoNulls 5.56989e+09 5.50649e+09 -0.0113822
ArrayDataConstructDestruct 100866 100512 -0.00350968
BuildAdaptiveIntNoNullsScalarAppend 1.43918e+09 1.43704e+09 -0.00149109
BuildIntArrayNoNulls 3.07021e+09 3.02253e+09 -0.0155277
BuildBinaryArray 3.24545e+08 3.32854e+08 0.0256022
BuildAdaptiveIntNoNulls 1.07476e+10 1.06152e+10 -0.012321
BufferBuilderTinyWrites/real_time 4.78079e+08 4.77451e+08 -0.00131337
BuildInt64DictionaryArraySimilar 2.72678e+08 2.70806e+08 -0.00686518
BuildChunkedBinaryArray 2.72078e+08 2.71715e+08 -0.00133257
BufferBuilderSmallWrites/real_time 3.61001e+09 3.51775e+09 -0.0255559
BuildInt64DictionaryArrayRandom 3.21505e+08 3.52824e+08 0.0974141
BuildDecimalArray 5.91154e+08 5.91009e+08 -0.000245778
BuildStringDictionaryArray 2.45288e+08 2.46456e+08 0.00476031
ReadJSONBlockWithSchemaMultiThread/real_time 1.89042e+08 1.87233e+08 -0.00956962
ChunkJSONLineDelimited 109.917 109.879 -0.00034497
- ChunkJSONPrettyPrinted 8.40633e+07 7.68993e+07 -0.0852212
ParseJSONBlockWithSchema 4.18418e+07 4.08108e+07 -0.02464
ReadJSONBlockWithSchemaSingleThread 3.78651e+07 3.92098e+07 0.0355126
WriteRecordBatch/64/real_time 1.06445e+10 1.02481e+10 -0.0372385
WriteRecordBatch/16/real_time 1.26426e+10 1.2134e+10 -0.0402267
WriteRecordBatch/8192/real_time 3.27965e+08 3.2295e+08 -0.0152935
ReadRecordBatch/4/real_time 6.29799e+11 6.24777e+11 -0.00797472
ReadRecordBatch/1/real_time 1.18357e+12 1.16548e+12 -0.0152862
WriteRecordBatch/1/real_time 1.32558e+10 1.27121e+10 -0.0410175
WriteRecordBatch/1024/real_time 2.51259e+09 2.4623e+09 -0.0200146
ReadRecordBatch/16/real_time 2.10718e+11 2.09605e+11 -0.00527934
ReadRecordBatch/8192/real_time 2.54e+08 2.50965e+08 -0.0119485
WriteRecordBatch/4/real_time 1.32514e+10 1.26804e+10 -0.0430911
ReadRecordBatch/1024/real_time 2.66116e+09 2.59249e+09 -0.0258042
ReadRecordBatch/64/real_time 5.59284e+10 5.58811e+10 -0.000846173
ReadRecordBatch/4096/real_time 4.92831e+08 4.87308e+08 -0.0112069
WriteRecordBatch/4096/real_time 6.61193e+08 6.49704e+08 -0.0173763
WriteRecordBatch/256/real_time 6.64758e+09 6.47117e+09 -0.0265377
ReadRecordBatch/256/real_time 1.12464e+10 1.07343e+10 -0.0455329
ThreadPoolSpawn/threads:4/task_cost:10000/real_time 436570 433540 -0.0069404
ThreadedTaskGroup/threads:8/task_cost:10000/real_time 196185 194601 -0.0080736
ThreadedTaskGroup/threads:1/task_cost:10000/real_time 126288 126160 -0.00101618
ThreadedTaskGroup/threads:1/task_cost:100000/real_time 12910.4 12910 -2.9045e-05
ThreadPoolSpawn/threads:4/task_cost:100000/real_time 37147.6 39486.5 0.0629644
ThreadedTaskGroup/threads:8/task_cost:100000/real_time 49990.4 49991.5 2.23484e-05
ThreadedTaskGroup/threads:4/task_cost:100000/real_time 47222.7 47074 -0.00314901
ThreadPoolSpawn/threads:1/task_cost:100000/real_time 12590.1 12603.8 0.00108821
ThreadedTaskGroup/threads:2/task_cost:100000/real_time 25416.2 25417 3.07305e-05
- ThreadedTaskGroup/threads:8/task_cost:1000/real_time 211488 187692 -0.112517
ThreadedTaskGroup/threads:1/task_cost:1000/real_time 857807 840825 -0.0197967
ThreadedTaskGroup/threads:4/task_cost:10000/real_time 450754 445731 -0.0111433
ThreadPoolSpawn/threads:1/task_cost:1000/real_time 921863 905253 -0.0180179
ThreadedTaskGroup/threads:2/task_cost:1000/real_time 231811 255898 0.103906
ThreadPoolSpawn/threads:8/task_cost:1000/real_time 208669 216882 0.0393587
ThreadPoolSpawn/threads:1/task_cost:10000/real_time 124893 124703 -0.0015147
ThreadPoolSpawn/threads:2/task_cost:1000/real_time 385242 394750 0.0246785
ThreadPoolSpawn/threads:4/task_cost:1000/real_time 207837 208191 0.00170517
ThreadedTaskGroup/threads:4/task_cost:1000/real_time 194233 195368 0.00584108
ThreadedTaskGroup/threads:2/task_cost:10000/real_time 235410 237657 0.0095457
SerialTaskGroup/task_cost:100000/real_time 13064 13064 -2.49535e-06
ThreadPoolSpawn/threads:2/task_cost:10000/real_time 210322 220553 0.0486437
ThreadPoolSpawn/threads:8/task_cost:100000/real_time 51622.4 51607.3 -0.000290675
SerialTaskGroup/task_cost:10000/real_time 130228 130226 -1.44551e-05
ThreadPoolSpawn/threads:2/task_cost:100000/real_time 22110.6 22099.7 -0.000491798
ThreadPoolSpawn/threads:8/task_cost:10000/real_time 273366 275064 0.00620986
SerialTaskGroup/task_cost:1000/real_time 1.25775e+06 1.25776e+06 3.98338e-06
ChunkCSVEscapedBlock 9.35103e+08 9.3539e+08 0.000307353
ChunkCSVNoNewlinesBlock 11.2806 11.3988 0.0104752
ChunkCSVQuotedBlock 8.48658e+08 8.48592e+08 -7.77781e-05
ParseCSVQuotedBlock 3.30078e+08 3.29997e+08 -0.000245724
ParseCSVEscapedBlock 2.86795e+08 2.8676e+08 -0.000122009
====================================================== ================ ================ ============ |
The ursabot benchmark numbers above are slightly noisy. I don't the trie or utf8 benchmarks can be impacted by memory allocator changes. It seems there are no actual regressions, at least on that machine. |
@wesm You may want to run some benchmarks on your machine. |
e845e32
to
a571cb9
Compare
Thanks @pitrou for investigating this. I will run a benchmark comparison on my machine (pretty newish i9-9960X) once I address https://issues.apache.org/jira/browse/ARROW-6559 which I just opened. |
I also just rebased so hopefully we get a passing build now that the CI failure from ARROW-6509 is unblocked |
a571cb9
to
e9c1847
Compare
Here are the benchmark results on my machine (I needed ARROW-6559 to run them) https://gist.github.com/wesm/7501f688ee221ea826a56092dc02c471 I didn't see anything significant so I'm merging this |
It looks like we have a UBSAN failure
@emkornfield any clue why this would only show up here and not on master (this failure isn't occurring there)? |
@wesm This has to do with Parquet reading the data for null values. This data seems uninitialized so depending on the memory allocator may take different values. IMO this may point to a problem in Parquet: it shouldn't serialize uninitialized data (that data may hold user secrets). Note that Arrow itself is not at fault: null data is zero-initialized in |
By the way, 2776655897 is supposed to be a number of days since Unix epoch. If we add
This points exactly to uninitialized memory, flagged thanks to jemalloc's "junk" option.
|
Ok, the uninitialized memory issue is actually on the decoding side. I will push a fix, I'll let you double-check @wesm |
Passing Travis-CI build at https://travis-ci.org/pitrou/arrow/builds/585577226 |
Revert "ARROW-6478: [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues" This reverts commit 53c5af0. In addition, configure jemalloc to fix the performance regression.
8fdb0bd
to
71dd00f
Compare
Codecov Report
@@ Coverage Diff @@
## master #5365 +/- ##
==========================================
+ Coverage 88.58% 89.14% +0.55%
==========================================
Files 950 758 -192
Lines 126213 110756 -15457
Branches 1495 0 -1495
==========================================
- Hits 111808 98732 -13076
+ Misses 14040 12024 -2016
+ Partials 365 0 -365
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Revert "ARROW-6478: [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues"
This reverts commit 53c5af0.
In addition, configure jemalloc to fix the performance regression.